Scalable Self-Tuning Implementation of Smith-Waterman Algorithm for Multicore CPUs

نویسندگان

  • Faisal Sikder
  • Dilip Sarkar
چکیده

Improved version of the Smith-Waterman algorithm (SWA) is most widely used for local alignment of a pattern (or query) sequence with a Database (DB) sequence. This dynamicprogramming algorithm is computation intensive. To reduce time for computing alignment score matrix, parallel versions have been implemented on GPUs and multicore CPUs. These parallel versions have shown significant speedup when compared with their corresponding sequential versions. Our initial evaluation of an OpenMP parallelization of SWA has shown linear speedup on multicore CPUs, but a closer look at performance data from both sequential and parallel versions have revealed two undesired effects: (i) As the length of the DB sequence increases, the number of elements of the alignment score matrix H computed in per unit time initially increases, then reaches a maximum, and finally decreases continuously; (ii) the length of the DB sequence where decline starts is different for different CPUs. To overcome the computation rate decline we have proposed a run-time self-tuning algorithm. It determines the length, l, of a DB sequence that maximize computation rate during execution time. Then, divide computation of H into computation of a set of submatrices, such that the number of columns in each submatrix is about l. Our study also found that the number of per-core-threads that delivers the highest rate of computation differs from CPU to CPU. Our proposed algorithm determines optimal number of threads during execution time and creates optimal number of threads for highest possible computation rate. Our extensive evaluations of the proposed self-tuning algorithm on three different multicore multi-CPU shared memory machines have shown significant performance improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-fidelity simulation of collective effects in electron beams using an innovative parallel method

Among the most challenging and heretofore unsolved problems in accelerator physics is accurate simulation of the collective effects in electron beams. Electron beam dynamics is crucial in understanding and the design of: (i) high-brightness synchrotron light sources — powerful tools for cutting-edge research in physics, biology, medicine and other fields, and (ii) electron-ion particle collider...

متن کامل

An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs

The tile QR factorization provides an efficient and scalable way for factoring a dense matrix in parallel on multicore processors. This article presents a way of efficiently implementing the algorithm on a system with a powerful GPU and many multicore CPUs.

متن کامل

Revisiting the Speed-versus-Sensitivity Tradeoff in Pairwise Sequence Search

The Smith-Waterman algorithm is a dynamic programming method for determining optimal local alignments between nucleotide or protein sequences. However, it suffers from quadratic time and space complexity. As a result, many algorithmic and architectural enhancements have been proposed to solve this problem, but at the cost of reduced sensitivity in the algorithms or significant expense in hardwa...

متن کامل

A parallel and sensitive software tool for methylation analysis on multicore platforms

MOTIVATION DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor...

متن کامل

ParaXML: A Parallel XML Processing Model on the Multicore CPUs

XML has emerged as the de facto standard interoperable data format for the web service, the database and document processing systems. The processing of the XML documents, however, has been recognized as the performance bottleneck in those systems; as a result the demand for highperformance XML processing grows rapidly. On the hardware front, the multicore processor is increasingly becoming avai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014